Goto

Collaborating Authors

 Tunapuna-Piarco


Performance Evaluation and Comparison of a New Regression Algorithm

Gooljar, Sabina, Manohar, Kris, Hosein, Patrick

arXiv.org Artificial Intelligence

In recent years, Machine Learning algorithms, in particular supervised learning techniques, have been shown to be very effective in solving regression problems. We compare the performance of a newly proposed regression algorithm against four conventional machine learning algorithms namely, Decision Trees, Random Forest, k-Nearest Neighbours and XG Boost. The proposed algorithm was presented in detail in a previous paper but detailed comparisons were not included. We do an in-depth comparison, using the Mean Absolute Error (MAE) as the performance metric, on a diverse set of datasets to illustrate the great potential and robustness of the proposed approach. The reader is free to replicate our results since we have provided the source code in a GitHub repository while the datasets are publicly available.


Exploring Supervised Machine Learning for Multi-Phase Identification and Quantification from Powder X-Ray Diffraction Spectra

Greasley, Jaimie, Hosein, Patrick

arXiv.org Artificial Intelligence

Powder X-ray diffraction analysis is a critical component of materials characterization methodologies. Discerning characteristic Bragg intensity peaks and assigning them to known crystalline phases is the first qualitative step of evaluating diffraction spectra. Subsequent to phase identification, Rietveld refinement may be employed to extract the abundance of quantitative, material-specific parameters hidden within powder data. These characterization procedures are yet time-consuming and inhibit efficiency in materials science workflows. The ever-increasing popularity and propulsion of data science techniques has provided an obvious solution on the course towards materials analysis automation. Deep learning has become a prime focus for predicting crystallographic parameters and features from X-ray spectra. However, the infeasibility of curating large, well-labelled experimental datasets means that one must resort to a large number of theoretic simulations for powder data augmentation to effectively train deep models. Herein, we are interested in conventional supervised learning algorithms in lieu of deep learning for multi-label crystalline phase identification and quantitative phase analysis for a biomedical application. First, models were trained using very limited experimental data. Further, we incorporated simulated XRD data to assess model generalizability as well as the efficacy of simulation-based training for predictive analysis in a real-world X-ray diffraction application.


An Optimization-Based Supervised Learning Algorithm for PXRD Phase Fraction Estimation

Hosein, Patrick, Greasley, Jaimie

arXiv.org Artificial Intelligence

In powder diffraction data analysis, phase identification is the process of determining the crystalline phases in a sample using its characteristic Bragg peaks. For multiphasic spectra, we must also determine the relative weight fraction of each phase in the sample. Machine Learning algorithms (e.g., Artificial Neural Networks) have been applied to perform such difficult tasks in powder diffraction analysis, but typically require a significant number of training samples for acceptable performance. We have developed an approach that performs well even with a small number of training samples. We apply a fixed-point iteration algorithm on the labelled training samples to estimate monophasic spectra. Then, given an unknown sample spectrum, we again use a fixed-point iteration algorithm to determine the weighted combination of monophase spectra that best approximates the unknown sample spectrum. These weights are the desired phase fractions for the sample. We compare our approach with several traditional Machine Learning algorithms.


A Data Science Approach to Risk Assessment for Automobile Insurance Policies

Hosein, Patrick

arXiv.org Artificial Intelligence

In order to determine a suitable automobile insurance policy premium one needs to take into account three factors, the risk associated with the drivers and cars on the policy, the operational costs associated with management of the policy and the desired profit margin. The premium should then be some function of these three values. We focus on risk assessment using a Data Science approach. Instead of using the traditional frequency and severity metrics we instead predict the total claims that will be made by a new customer using historical data of current and past policies. Given multiple features of the policy (age and gender of drivers, value of car, previous accidents, etc.) one can potentially try to provide personalized insurance policies based specifically on these features as follows. We can compute the average claims made per year of all past and current policies with identical features and then take an average over these claim rates. Unfortunately there may not be sufficient samples to obtain a robust average. We can instead try to include policies that are "similar" to obtain sufficient samples for a robust average. We therefore face a trade-off between personalization (only using closely similar policies) and robustness (extending the domain far enough to capture sufficient samples). This is known as the Bias-Variance Trade-off. We model this problem and determine the optimal trade-off between the two (i.e. the balance that provides the highest prediction accuracy) and apply it to the claim rate prediction problem. We demonstrate our approach using real data.


Scalable Causal Learning for Predicting Adverse Events in Smart Buildings

Basak, Aniruddha (Carnegie Mellon University, Silicon Valley Campus) | Mengshoel, Ole (Carnegie Mellon University, Silicon Valley Campus) | Hosein, Stefan (University of the West Indies, St. Augustine) | Martin, Rodney (NASA Ames Research Center)

AAAI Conferences

Emerging smart buildings, such as the NASA Sustainability Base (SB), have a broad range of energy-related systems, including systems for heating and cooling. While the innovative technologies found in SB and similar smart buildings have the potential to increase the usage of renewable energy, they also add substantial technical complexity. Consequently, managing a smart building can be a challenge compared to managing a traditional building, sometimes leading to adverse events including unintended thermal discomfort of occupants (“too hot” or “too cold”). Fortunately, today’s smart buildings are typically equipped with thousands of sensors, controlled by Building Automation Systems (BASs). However, manually monitoring a BAS time series data stream with thousands of values may lead to information overload for the people managing a smart building. We present here a novel technique, Scalable Causal Learning (SCL), that integrates dimensionality reduction and Bayesian network structure learning techniques. SCL solves two problems associated with the naive application of dimensionality reduction and causal machine learning techniques to BAS time series data: (i) using autoregressive methods for causal learning can lead to induction of spurious causes and (ii) inducing a causal graph from BAS sensor data using existing graph structure learning algorithms may not scale to large data sets. Our novel SCL method addresses both of these problems. We test SCL using time series data from the SB BAS, comparing it with a causal graph learning technique, the PC algorithm. The causal variables identified by SCL are effective in predicting adverse events, namely abnormally low room temperatures, in a conference room in SB. Specifically, the SCL method performs better than the PC algorithm in terms of false alarm rate, missed detection rate and detection time.